多对象跟踪(MOT)是一项具有挑战性的任务,涉及检测场景中的对象并通过一系列帧跟踪它们。由于时间阻塞以及一系列图像序列的变化,评估此任务很困难。 Kitti等数据集上基准MOT方法的主要评估度量已成为高阶跟踪准确性(HOTA)度量,该指标能够更好地描述MOTA,DETA和IDF1等指标的性能。点检测和跟踪是一项密切相关的任务,可以将其视为对象检测的特殊情况。但是,评估检测任务本身(点距离与边界框重叠)存在差异。当包括时间维度和多视图方案时,评估任务变得更加复杂。在这项工作中,我们提出了一个多视图高阶跟踪指标(MVHOTA),以确定多点(多企业和多级)检测的准确性,同时考虑到时间和空间关联。 MVHOTA可以解释为检测,关联和对应准确性的几何平均值,从而为每个因素提供相等的权重。我们通过以前有组织的医疗挑战中的公开内窥镜检测数据集证明了用例。此外,我们与此用例的其他调整后的MOT指标进行比较,讨论MVHOTA的属性,并展示提出的对应准确性和闭塞指数如何促进对闭塞处理方法的分析。该代码将公开可用。
translated by 谷歌翻译
目的:二尖瓣修复是心脏瓣膜的复杂微创手术。在这种情况下,来自内窥镜图像的缝合线检测是一种高度相关的任务,该任务提供了分析缝合模式的定量信息,评估假肢配置并产生增强的现实可视化。面部或解剖标志性的检测任务通常包含固定数量的地标,并使用回归或固定的基于热线图的方法来定位标志性。然而,在内窥镜检查中,每个图像中存在不同数量的缝合线,并且缝合线可能发生在环形空中的任何位置,因为它们不是语义唯一的。方法:在这项工作中,我们将缝合检测任务制定为多实例的深热映射回归问题,以识别缝合线的进入和退出点。我们扩展了我们以前的工作,并介绍了一个新颖的使用2D高斯层,然后是可分辨率的2D空间软氩模层作为局部非最大抑制。结果:我们用多种热映射分布功能和所提出的模型的两个变体呈现广泛的实验。在术中帧内结构域中,变体1在基线上显示了+0.0422的平均f1。类似地,在模拟器域中,变体1在基线上显示了+0.0865的平均f1。结论:拟议的模型显示出在帧内和模拟器域中的基线上的改进。在Miccai Adaptor2021挑战HTTPS://Adaptor2021.github.io/的范围内公开可用,以及https://github.com/cardio-ai/suture-detection-pytorch/的代码。 DOI:10.1007 / S11548-021-02523-W。可以在此处找到与开放式接入文章的链接:https://link.springer.com/article/10.1007%2FS11548-021-02523
translated by 谷歌翻译
In this new computing paradigm, named quantum computing, researchers from all over the world are taking their first steps in designing quantum circuits for image processing, through a difficult process of knowledge transfer. This effort is named Quantum Image Processing, an emerging research field pushed by powerful parallel computing capabilities of quantum computers. This work goes in this direction and proposes the challenging development of a powerful method of image denoising, such as the Total Variation (TV) model, in a quantum environment. The proposed Quantum TV is described and its sub-components are analysed. Despite the natural limitations of the current capabilities of quantum devices, the experimental results show a competitive denoising performance compared to the classical variational TV counterpart.
translated by 谷歌翻译
密切的人类机器人互动(HRI),尤其是在工业场景中,已经对结合人类和机器人技能的优势进行了广泛的研究。对于有效的HRI,应质疑当前可用的人机通信媒体或工具的有效性,并应探讨新的交流方式。本文提出了一个模块化体系结构,允许人类操作员通过不同的方式与机器人互动。特别是,我们使用智能手表和平板电脑分别实施了架构来分别处理手势和触摸屏输入。最后,我们在这两种方式之间进行了比较用户体验研究。
translated by 谷歌翻译
语言模型既展示了定量的改进,又展示了新的定性功能,随着规模的增加。尽管它们具有潜在的变革性影响,但这些新能力的特征却很差。为了为未来的研究提供信息,为破坏性的新模型能力做准备,并改善社会有害的效果,至关重要的是,我们必须了解目前和近乎未来的能力和语言模型的局限性。为了应对这一挑战,我们介绍了超越模仿游戏基准(Big Bench)。 Big Bench目前由204个任务组成,由132家机构的442位作者贡献。任务主题是多样的,从语言学,儿童发展,数学,常识性推理,生物学,物理学,社会偏见,软件开发等等。 Big-Bench专注于被认为超出当前语言模型的功能的任务。我们评估了OpenAI的GPT型号,Google内部密集变压器体系结构和大型基础上的开关稀疏变压器的行为,跨越了数百万到数十亿个参数。此外,一个人类专家评估者团队执行了所有任务,以提供强大的基准。研究结果包括:模型性能和校准都随规模改善,但绝对的术语(以及与评估者的性能相比);在模型类中的性能非常相似,尽管带有稀疏性。逐渐和预测的任务通常涉及大量知识或记忆成分,而在临界规模上表现出“突破性”行为的任务通常涉及多个步骤或组成部分或脆性指标;社交偏见通常会随着含糊不清的环境而随着规模而增加,但这可以通过提示来改善。
translated by 谷歌翻译
两区图像分割是将图像分为两个感兴趣的区域,即前景和背景的过程。为此,Chan等人。[Chan,Esedo \ = Glu,Nikolova,Siam on Applied Mathematics 66(5),1632-1648,2006]设计了一个非常适合平滑图像的模型。该模型的一个缺点是,当图像包含振荡组件时,它可能会产生不良的分割。基于要分割的图像的卡通文本分解,我们提出了一个新模型,该模型能够对图像进行准确的分割,其中还包含噪声或振荡信息(例如纹理)。新型模型导致了一个非平滑约束优化问题,我们通过ADMM方法解决了该问题。还证明了数值方案的收敛性。关于平滑,嘈杂和纹理图像的几项实验显示了所提出的模型的有效性。
translated by 谷歌翻译
阿拉伯联合酋长国阿布扎比技术创新研究所最近完成了一辆新的无人面车辆的生产和测试,称为Nukhada,专门用于自主调查,检查和对水下行动的支持。此稿件描述了Nukhada USV的主要特征,以及在开发期间进行的一些试验。
translated by 谷歌翻译
我们描述了作为黑暗机器倡议和LES Houches 2019年物理学研讨会进行的数据挑战的结果。挑战的目标是使用无监督机器学习算法检测LHC新物理学的信号。首先,我们提出了如何实现异常分数以在LHC搜索中定义独立于模型的信号区域。我们定义并描述了一个大型基准数据集,由> 10亿美元的Muton-Proton碰撞,其中包含> 10亿美元的模拟LHC事件组成。然后,我们在数据挑战的背景下审查了各种异常检测和密度估计算法,我们在一组现实分析环境中测量了它们的性能。我们绘制了一些有用的结论,可以帮助开发无监督的新物理搜索在LHC的第三次运行期间,并为我们的基准数据集提供用于HTTPS://www.phenomldata.org的未来研究。重现分析的代码在https://github.com/bostdiek/darkmachines-unsupervisedChallenge提供。
translated by 谷歌翻译
Advances in computer vision and machine learning techniques have led to significant development in 2D and 3D human pose estimation from RGB cameras, LiDAR, and radars. However, human pose estimation from images is adversely affected by occlusion and lighting, which are common in many scenarios of interest. Radar and LiDAR technologies, on the other hand, need specialized hardware that is expensive and power-intensive. Furthermore, placing these sensors in non-public areas raises significant privacy concerns. To address these limitations, recent research has explored the use of WiFi antennas (1D sensors) for body segmentation and key-point body detection. This paper further expands on the use of the WiFi signal in combination with deep learning architectures, commonly used in computer vision, to estimate dense human pose correspondence. We developed a deep neural network that maps the phase and amplitude of WiFi signals to UV coordinates within 24 human regions. The results of the study reveal that our model can estimate the dense pose of multiple subjects, with comparable performance to image-based approaches, by utilizing WiFi signals as the only input. This paves the way for low-cost, broadly accessible, and privacy-preserving algorithms for human sensing.
translated by 谷歌翻译
Due to the environmental impacts caused by the construction industry, repurposing existing buildings and making them more energy-efficient has become a high-priority issue. However, a legitimate concern of land developers is associated with the buildings' state of conservation. For that reason, infrared thermography has been used as a powerful tool to characterize these buildings' state of conservation by detecting pathologies, such as cracks and humidity. Thermal cameras detect the radiation emitted by any material and translate it into temperature-color-coded images. Abnormal temperature changes may indicate the presence of pathologies, however, reading thermal images might not be quite simple. This research project aims to combine infrared thermography and machine learning (ML) to help stakeholders determine the viability of reusing existing buildings by identifying their pathologies and defects more efficiently and accurately. In this particular phase of this research project, we've used an image classification machine learning model of Convolutional Neural Networks (DCNN) to differentiate three levels of cracks in one particular building. The model's accuracy was compared between the MSX and thermal images acquired from two distinct thermal cameras and fused images (formed through multisource information) to test the influence of the input data and network on the detection results.
translated by 谷歌翻译